Estimating Disaggregated Employment Size from Points-of-Interest and Census Data: From Mining the Web to Model Implementation and Visualization

نویسندگان

  • Filipe Rodrigues
  • Ana Alves
  • Evgheni Polisciuc
  • Shan Jiang
  • Joseph Ferreira
  • Francisco C. Pereira
چکیده

The global spread of internet access and the ubiquity of internet capable devices has lead to an increased online presence on the behalf of companies and businesses, namely in collaborative platforms called local directories, where Points-ofInterest (POIs) are usually classified with a set of categories and tags. Such information can be extremely useful, especially if aggregated under a common (shared) taxonomy. This article proposes a complete framework for the urban planning task of disaggregated employment size estimation based on collaborative online POI data, collected using web mining techniques. In order to make the analysis possible, we present a machine learning approach to automatically classify POIs to a common taxonomy the North American Industry Classification System. This hierarchical taxonomy is applied in many areas, particularly in urban planning, since it allows for a proper analysis of the data at different levels of detail, depending on the practical application at hand. The classified POIs are then used to estimate disaggregated employment size, at a finer level than previously possible, using a maximum likelihood estimator. We empirically show that the automatically-classified online POIs are competitive with proprietary gold-standard POI data. This fact is then supported through a set of new visualizations that allow us to understand the spatial distribution of the classification error and its relation with employment size error. Keywords—machine learning, spatial analysis, points-ofinterest, urban planning, GIS.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Small Area Estimation Methods for Estimating Unemployment Rate

Extended Abstract. In recent years, needs for small area estimations have been greatly increased for large surveys particularly household surveys in Sta­ tistical Centre of Iran (SCI), because of the costs and respondent burden. The lack of suitable auxiliary variables between two decennial housing and popula­ tion census is a challenge for SCI in using these methods. In general, the...

متن کامل

Estimating Most Productive Scale Size of the provinces of Iran in the Employment sector using Interval data in Imprecise Data Envelopment Analysis(IDEA)

Unemployment is one of the most important economic problems in Iran, so that many of its managers plan to increase employment rates. Increasing the employment rate needs to increase economic productivity which DEA is one of the most appropriate evaluation methods for estimating the productivity of similar organizations. Employment in the amount of data input and output can be just interval. In ...

متن کامل

Mining point-of-interest data from social networks for urban land use classification and disaggregation

Over the last few years, much online volunteered geographic information (VGI) has emerged and has been increasingly analyzed to understand places and cities, as well as human mobility and activity. However, there are concerns about the quality and usability of such VGI. In this study, we demonstrate a complete process that comprises the collection, unification, classification and validation of ...

متن کامل

Presented a method for estimating the cost of software using PCA to reduce the size and with the help of data mining

  These days, data mining one of the most significant issues. One field data mining is a mixture of computer science and statistics which is considerably limited due to increase in digital data and growth of computational power of computer. One of the domains of data mining is the software cost estimation category. In this article, classifying techniques of learning algorithm of machine ...

متن کامل

Designing a System for Trend Analysis of Users in Website Surfing in Iran Using Data Mining and Text Mining Algorithms

Background and Aim: As of the entrance of web surfing to the lifestyle of a vast majority of people in the society and the need for a more accurate social and cultural policy making in the field, authors intended to analyze the behavior of the society users in viewing different websites so as to help politicians and practitioners. Methods: Design science research method is used in this research...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013